14 research outputs found
Additive Noise Mechanisms for Making Randomized Approximation Algorithms Differentially Private
The exponential increase in the amount of available data makes taking
advantage of them without violating users' privacy one of the fundamental
problems of computer science for the 21st century. This question has been
investigated thoroughly under the framework of differential privacy. However,
most of the literature has not focused on settings where the amount of data is
so large that we are not able to compute the exact answer in the non-private
setting (such as in the streaming setting, sublinear-time setting, etc.). This
can often make the use of differential privacy unfeasible in practice. In this
paper, we show a general approach for making Monte-Carlo randomized
approximation algorithms differentially private. We only need to assume the
error of the approximation algorithm is sufficiently concentrated around
(e.g.\ is bounded) and that the function being approximated has a
small global sensitivity .
First, we show that if the error is subexponential, then the Laplace
mechanism with error magnitude proportional to the sum of and the
\emph{subexponential diameter} of the error of the algorithm makes the
algorithm differentially private. This is true even if the worst-case global
sensitivity of the algorithm is large or even infinite. We then introduce a new
additive noise mechanism, which we call the zero-symmetric Pareto mechanism. We
show that using this mechanism, we can make an algorithm differentially private
even if we only assume a bound on the first absolute moment of the error
.
Finally, we use our results to give the first differentially private
algorithms for various problems. This includes results for frequency moments,
estimating the average degree of a graph in sublinear time, or estimating the
size of the maximum matching. Our results raise many new questions; we state
multiple open problems
Massively Parallel Computation and Sublinear-Time Algorithms for Embedded Planar Graphs
While algorithms for planar graphs have received a lot of attention, few
papers have focused on the additional power that one gets from assuming an
embedding of the graph is available. While in the classic sequential setting,
this assumption gives no additional power (as a planar graph can be embedded in
linear time), we show that this is far from being the case in other settings.
We assume that the embedding is straight-line, but our methods also generalize
to non-straight-line embeddings. Specifically, we focus on sublinear-time
computation and massively parallel computation (MPC).
Our main technical contribution is a sublinear-time algorithm for computing a
relaxed version of an -division. We then show how this can be used to
estimate Lipschitz additive graph parameters. This includes, for example, the
maximum matching, maximum independent set, or the minimum dominating set. We
also show how this can be used to solve some property testing problems with
respect to the vertex edit distance.
In the second part of our paper, we show an MPC algorithm that computes an
-division of the input graph. We show how this can be used to solve various
classical graph problems with space per machine of for
some , and while performing rounds. This includes for
example approximate shortest paths or the minimum spanning tree. Our results
also imply an improved MPC algorithm for Euclidean minimum spanning tree
Estimating the Effective Support Size in Constant Query Complexity
Estimating the support size of a distribution is a well-studied problem in
statistics. Motivated by the fact that this problem is highly non-robust (as
small perturbations in the distributions can drastically affect the support
size) and thus hard to estimate, Goldreich [ECCC 2019] studied the query
complexity of estimating the -\emph{effective support size}
of a distribution , which is equal to the smallest
support size of a distribution that is -far in total variation
distance from .
In his paper, he shows an algorithm in the dual access setting (where we may
both receive random samples and query the sampling probability for any
) for a bicriteria approximation, giving an answer in
for some
values . However, his algorithm has either super-constant
query complexity in the support size or super-constant approximation ratio
. He then asked if this is necessary, or if it is
possible to get a constant-factor approximation in a number of queries
independent of the support size.
We answer his question by showing that not only is complexity independent of
possible for , but also for , that is, that the
bicriteria relaxation is not necessary. Specifically, we show an algorithm with
query complexity . That is, for any , we output in this complexity a number . We also show that it is
possible to solve the approximate version with approximation ratio
in complexity . Our algorithm is very simple, and has short lines of
pseudocode
Sampling and Counting Edges via Vertex Accesses
We consider the problems of sampling and counting edges from a graph on
vertices where our basic access is via uniformly sampled vertices. When we have
a vertex, we can see its degree, and access its neighbors. Eden and Rosenbaum
[SOSA 2018] have shown it is possible to sample an edge -uniformly in
vertex accesses. Here, we get down to
expected vertex accesses. Next, we
consider the problem of sampling edges. For this we introduce a model
that we call hash-based neighbor access. We show that, w.h.p, we can sample
edges exactly uniformly at random, with or without replacement, in
vertex accesses. We present a
matching lower bound of which holds
for -uniform edge multi-sampling with some constant even
though our positive result has .
We then give an algorithm for edge counting. W.h.p., we count the number of
edges to within error in time . When is not too small (for ), we present a near-matching lower-bound of
. In the same range, the previous best
upper and lower bounds were polynomially worse in .
Finally, we give an algorithm that instead of hash-based neighbor access uses
the more standard pair queries (``are vertices and adjacent''). W.h.p.
it returns approximation of the number of edges and runs in
expected time .
This matches our lower bound when is not too small, specifically for
.Comment: This paper subsumes the arXiv report (arXiv:2009.11178) which only
contains the result on sampling one edg
CountSketches, Feature Hashing and the Median of Three
In this paper, we revisit the classic CountSketch method, which is a sparse,
random projection that transforms a (high-dimensional) Euclidean vector to
a vector of dimension , where are integer parameters. It
is known that even for , a CountSketch allows estimating coordinates of
with variance bounded by . For , the estimator takes
the median of independent estimates, and the probability that the
estimate is off by more than is exponentially small in
. This suggests choosing to be logarithmic in a desired inverse failure
probability. However, implementations of CountSketch often use a small,
constant . Previous work only predicts a constant factor improvement in this
setting.
Our main contribution is a new analysis of Count-Sketch, showing an
improvement in variance to when .
That is, the variance decreases proportionally to , asymptotically for
large enough . We also study the variance in the setting where an inner
product is to be estimated from two CountSketches. This finding suggests that
the Feature Hashing method, which is essentially identical to CountSketch but
does not make use of the median estimator, can be made more reliable at a small
cost in settings where using a median estimator is possible.
We confirm our theoretical findings in experiments and thereby help justify
why a small constant number of estimates often suffice in practice. Our
improved variance bounds are based on new general theorems about the variance
and higher moments of the median of i.i.d. random variables that may be of
independent interest